Distributed object-based storage system that uses pointers stored as object attributes for object analysis and monitoring

ABSTRACT

In a distributed object-based storage system that includes a plurality of object storage devices and one or more clients that access distributed, object-based files from the object storage devices, each of the files being comprised of a plurality of object components residing on different object storage device, systems and methods that use pointers stored as object attributes for file analysis and monitoring.

This application is a continuation of pending U.S. application Ser. No.15/261,588, which was filed on Sep. 9, 2016, which is a continuation ofU.S. application Ser. No. 13/082,255, which was filed on Apr. 7, 2011,now abandoned, which is a continuation of U.S. application Ser. No.11/825,921, which was filed on Jul. 10, 2007, now abandoned, which is acontinuation of application Ser. No. 10/918,202, which was filed on Aug.13, 2004, now abandoned, and are incorporated in their entirety hereinby reference.

FIELD OF THE INVENTION

The present invention generally relates to data storage methodologies,and, more particularly, to an object-based methodology that usespointers stored as object attributes to identify file objects that havemissing components, and/or to identify file object components stored ona specific object storage device.

BACKGROUND OF THE INVENTION

With increasing reliance on electronic means of data communication,different models to efficiently and economically store a large amount ofdata have been proposed. A data storage mechanism requires not only asufficient amount of physical disk space to store data, but variouslevels of fault tolerance or redundancy (depending on how critical thedata is) to preserve data integrity in the event of one or more diskfailures.

In a traditional networked storage system, a data storage device, suchas a hard disk, is associated with a particular server or a particularserver having a particular backup server. Thus, access to the datastorage device is available only through the server associated with thatdata storage device. A client processor desiring access to the datastorage device would, therefore, access the associated server throughthe network and the server would access the data storage device asrequested by the client. By contrast, in an object-based data storagesystem, each object-based storage device communicates directly withclients over a network. An example of an object-based storage system isshown in co-pending, commonly-owned, U.S. patent application Ser. No.10/109,998, filed on Mar. 29, 2002, titled “Data File Migration from aMirrored RAID to a Non-Mirrored XOR-Based RAID Without Rewriting theData,” incorporated by reference herein in its entirety.

Existing object-based storage systems, such as the one described inco-pending application Ser. No. 10/109,998, typically include aplurality of object-based storage devices for storing object components,a metadata server, and one or more clients that access distributed,object-based files on the object storage devices. In such systems, it istypically expensive to identify file objects that need to bereconstructed. For example, a call must initially be made to list(component) objects from each Object-Based Storage Device (OBD). EachOBD in turn would return a list of, for example 500 objects. These listsmust then be merged together to make a list of up to 500 (virtual)objects. Reconstruction then requires retrieval of a map (i.e., layoutinformation showing the physical location on the OBDs where eachcomponent of an object resides) from one component of each object todetermine if any component of the object was on a non-working OBD. Onlyafter these steps were done, could the object be reconstructed. Given anaverage file size of two components (e.g., 64 k per component) and atypical number of OBDs of 10, only about 11% of the objects for whichattributes are retrieved in the reconstruction process need to bereconstructed.

What is needed is an improvement over existing systems that provides amore efficient system and method for identifying file objects thatrequire reconstruction.

SUMMARY OF THE INVENTION

The present invention is directed to a distributed object-based storagesystem that includes a plurality of object storage devices and one ormore clients that access distributed, object-based files from the objectstorage devices, each of the files being comprised of a plurality ofobject components residing on different object storage devices. Asexplained below, the present invention provides several systems andmethods that use pointers stored as object attributes for file analysisand monitoring in distributed object-based storage systems.

In accordance with a first aspect, the present invention provides asystem and method for detecting files with one or more missingcomponents. For each component of each file, a pointer is stored in anattribute field of the component, wherein the pointer points to afurther component of the file. Files with one or more missing componentsare identified by attempting to traverse the components of each fileusing the pointers. A file is determined to have one or more missingcomponents if all components associated with the file cannot betraversed using the pointers.

In accordance with a second aspect, the present invention provides asystem and method for identifying files containing at least onecomponent on a specific object storage device. For each component ofeach file, a pointer is stored in an attribute field of the component,wherein the pointer points to a further component of the file. Fileswith at least one component having, in its attribute field, a pointerthat points to a further component residing on the specific objectstorage device are then identified.

In accordance with a third aspect, the present invention is directed toa system and method for identifying files that are missing components.For at least one component of each file, a count value is stored in anattribute field of the component, wherein the count value corresponds toa maximum number of components for the file. For each file, a list ofcomponents in the file is retrieved and an attempt is made to retrievefrom an attribute field of at least one component of the file, the countvalue corresponding to the maximum number of components of the file. Foreach file, if the count value corresponding to the maximum number ofcomponents of the file was successfully retrieved from an attributefield of at least one component of the file, a number of components onthe list is compared to the count value in order to determine whetherthe file has fewer components than the count value. If the number ofcomponents on the list is less the count value, the file is flagged asmissing at least one component. In one embodiment, a file is alsoidentified as having at least one missing component if the attempt toretrieve the count value from an attribute field of at least onecomponent of the file is unsuccessful.

BRIEF DESCRIPTION OF THE DRAWING

The accompanying drawing, which is included to provide a furtherunderstanding of the invention and is incorporated in and constitutes apart of this specification, illustrates embodiments of the inventionthat together with the description serve to explain the principles ofthe invention:

FIG. 1 illustrates an exemplary network-based file storage systemdesigned around Object-Based Secure Disks (OBDs).

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Reference will now be made in detail to the preferred embodiments of thepresent invention, examples of which are illustrated in the accompanyingdrawing. It is to be understood that the figures and descriptions of thepresent invention included herein illustrate and describe elements thatare of particular relevance to the present invention, while eliminating,for purposes of clarity, other elements found in typical data storagesystems or networks.

FIG. 1 illustrates an exemplary network-based file storage system 100designed around Object Based Secure Disks (OBDs) 20. File storage system100 is implemented via a combination of hardware and software units andgenerally consists of manager software (simply, the “manager”) 10, OBDs20, clients 30 and metadata server 40. It is noted that each manager isan application program code or software running on a correspondingserver. Clients 30 may run different operating systems, and thus presentan operating system-integrated file system interface. Metadata stored onserver 40 may include file and directory object attributes as well asdirectory object contents. The term “metadata” generally refers not tothe underlying data itself, but to the attributes or information thatdescribe that data.

FIG. 1 shows a number of OBDs 20 attached to the network 50. An OBD 20is a physical disk drive that stores data files in the network-basedsystem 100 and may have the following properties: (1) it presents anobject-oriented interface (rather than a sector-oriented interface); (2)it attaches to a network (e.g., the network 50) rather than to a databus or a backplane (i.e., the OBDs 20 may be considered as first-classnetwork citizens); and (3) it enforces a security model to preventunauthorized access to data stored thereon.

The fundamental abstraction exported by an OBD 20 is that of an“object,” which may be defined as a variably-sized ordered collection ofbits. Contrary to the prior art block-based storage disks, OBDs do notexport a sector interface at all during normal operation. Objects on anOBD can be created, removed, written, read, appended to, etc. OBDs donot make any information about particular disk geometry visible, andimplement all layout optimizations internally, utilizing higher-levelinformation that can be provided through an OBD's direct interface withthe network 50. In one embodiment, each data file and each filedirectory in the file system 100 are stored using one or more OBDobjects. Because of object-based storage of data files, each file objectmay generally be read, written, opened, closed, expanded, created,deleted, moved, sorted, merged, concatenated, named, renamed, andinclude access limitations. Each OBD 20 communicates directly withclients 30 on the network 50, possibly through routers and/or bridges.The OBDs, clients, managers, etc., may be considered as “nodes” on thenetwork 50. In system 100, no assumption needs to be made about thenetwork topology except that each node should be able to contact everyother node in the system. Servers (e.g., metadata servers 40) in thenetwork 50 merely enable and facilitate data transfers between clientsand OBDs, but the servers do not normally implement such transfers.

Logically speaking, various system “agents” (i.e., the managers 10, theOBDs 20 and the clients 30) are independently-operating networkentities. Manager 10 may provide day-to-day services related toindividual files and directories, and manager 10 may be responsible forall file- and directory-specific states. Manager 10 creates, deletes andsets attributes on entities (i.e., files or directories) on clients'behalf (Clients may also set attributes themselves.) Manager 10 alsospecifies the layout of the data on the OBDs for performance and faulttolerance. “Aggregate” objects are objects that use OBDs in paralleland/or in redundant configurations, yielding higher availability of dataand/or higher I/O performance. Aggregation is the process ofdistributing a single data file or file directory over multiple OBDobjects, for purposes of performance (parallel access) and/or faulttolerance (storing redundant information). The aggregation schemeassociated with a particular object is stored as an attribute of thatobject on an OBD 20. A system administrator (e.g., a human operator orsoftware) may choose any aggregation scheme for a particular object.Both files and directories can be aggregated. In one embodiment, a newfile or directory inherits the aggregation scheme of its immediateparent directory, by default. A change in the layout of an object maycause a change in the layout of its parent directory. Manager 10 may beallowed to make layout changes for purposes of load or capacitybalancing.

The manager 10 may also allow clients to perform their own I/O toaggregate objects (which allows a direct flow of data between an OBD anda client), as well as providing proxy service when needed. As notedearlier, individual files and directories in the file system 100 may berepresented by unique OBD objects. Manager 10 may also determine exactlyhow each object will be laid out—i.e., on which OBD or OBDs that objectwill be stored, whether the object will be mirrored, striped,parity-protected, etc. Manager 10 may also provide an interface by whichusers may express minimum requirements for an object's storage (e.g.,“the object must still be accessible after the failure of any one OBD”).

Each manager 10 may be a separable component in the sense that themanager 10 may be used for other file system configurations or datastorage system architectures.

In one embodiment, the topology for the system 100 may include a “filesystem layer” abstraction and a “storage system layer” abstraction. Thefiles and directories in the system 100 may be considered to be part ofthe file system layer, whereas data storage functionality (involving theOBDs 20) may be considered to be part of the storage system layer. Inone topological model, the file system layer may be on top of thestorage system layer.

A storage access module (SAM) (not shown) is a program code module thatmay be compiled into managers and clients. The SAM includes an I/Oexecution engine that implements simple I/O, mirroring, and mapretrieval algorithms discussed below. The SAM generates and sequencesthe OBD-level operations necessary to implement system-level I/Ooperations, for both simple and aggregate objects.

Each manager 10 maintains global parameters, notions of what othermanagers are operating or have failed, and provides support for up/downstate transitions for other managers. A benefit to the present system isthat the location information describing at what data storage device(i.e., an OBD) or devices the desired data is stored may be located at aplurality of OBDs in the network. Therefore, a client 30 need onlyidentify one of a plurality of OBDs containing location information forthe desired data to be able to access that data. The data is may bereturned to the client directly from the OBDs without passing through amanager.

In one embodiment of the present invention, each object (e.g., file ordirectory) stored in distributed object-based storage system 100 isformed of a plurality of component objects that reside on different OBDs20. Every component object stored on a given OBD 20 has an associatedpointer that is stored as an attribute of the component object on thegiven OBD 20. Each such pointer ‘points’ to the next component in theobject, with the last component pointing back to the first. In this way,the pointers form a ring. As an example, if an object is composed ofthree components, A, B, C that are stored respectively on OBD1, OBD2,OBD3, the pointer stored as an object attribute of component A on OBD1would have the value OBD2, the pointer stored as an object attribute ofcomponent B on OBD2 would have the value OBD3, and the pointer stored asan object attribute of component C on OBD3 would have the value OBD1. Asexplained more fully below, if one of the OBDs fails, the pointers nowprovide an efficient way of finding which objects were effected by thefailure. For example, if OBD1 fails, manager 10 can perform a listattributes operation on each remaining OBD (i.e., OBD2 and OBD3 in theexample) and analyze the results in order to quickly identify thoseobjects that have pointers of value OBD1. Use of the pointersstreamlines the process of identifying objects on OBD1 that requirereconstruction, and represents an improvement over prior systems byeliminating the need to retrieve attributes for all objects in thesystem (e.g., maps for all objects in the system) in order to identifyobjects on the failed device (e.g., OBD1.)

The aforementioned pointers may be used for performing other objectanalysis and monitoring functions. For example, the pointers may be usedby manager 10 to perform detection of files with one or more missingcomponents. In the embodiment, manager 10 attempts to traverse thecomponents of each file object using the pointers. If manager 10 isunable to traverse the “ring” formed by the pointers of a given fileobject, manager 10 determines that the given object is missing one ormore component objects and optionally flags the file object forreconstruction.

The pointers may be used in a further way to identify file objects thatare missing components. In this embodiment, for at least one component(stored on a given OBD 20) of each file object, a count value is stored(on the given OBD 20) in an attribute field of the component. The countvalue corresponds to a maximum number of components for the file. Inorder to identify file objects that are missing components, manager 10retrieves a list of components in each file in the system. For each filein the system, manager 10 also attempts to retrieve from an attributefield associated with each component of the file, the count valuecorresponding to the maximum number of components of the file. Ifmanager 10 successfully retrieves the count value for a given file, thenumber of components on the list (i.e., the number of components of thefile previously retrieved by manager 10) is compared to the count value.If the number of components on the list is less the count value, manager10 flags the file as having a metadata inconsistency. In a furtherembodiment, manager 10 also identifies a file as having at least onemissing component if the attempt to retrieve the count value from anattribute field of a component of the file was unsuccessful.

Finally, it will be appreciated by those skilled in the art that changescould be made to the embodiments described above without departing fromthe broad inventive concept thereof. It is understood, therefore, thatthis invention is not limited to the particular embodiments disclosed,but is intended to cover modifications within the spirit and scope ofthe present invention as defined in the appended claims.

What is claimed is:
 1. In a distributed object-based storage system thatincludes a plurality of object storage devices, and one or more clientsthat access distributed, object-based files from the object storagedevices, each of said files being comprised of a plurality of objectcomponents residing on different object storage devices, a method foridentifying files that are missing components, comprising: for at leastone component of each file, storing on the object storage device storingthe component a single pointer having a count value in an attributefield of the component, wherein the count value of the single pointercorresponds to a maximum number of components for the file, and whereinthe single pointer points to a further component of the file such that alast single component points back to a first single component to form aring, and the last single component is different from the first singlecomponent, and the single pointer in the last component includes a valueof the first component; and for each file, using at least one networknode to retrieve a list of components in the file and attempt toretrieve from an attribute field of at least one component of the file,the count value of the single pointer corresponding to the maximumnumber of components of the file; and for each file, if the count valueof the single pointer corresponding to the maximum number of componentsof the file was successfully retrieved from an attribute field of atleast one component of the file, using at least one network node tocompare the number of components on the list to the count value of thesingle pointer and determine that the file has the missing components ifthe file has fewer components than the count value of the singlepointer.
 2. The method of claim 1, wherein a file is identified ashaving a metadata inconsistency if the attempt to retrieve the countvalue from an attribute field of at least one component of the file isunsuccessful.
 3. In a distributed object-based storage system thatincludes a plurality of object storage devices, and one or more clientsthat access distributed, object-based files from the object storagedevices, each of said files being comprised of a plurality of objectcomponents residing on different object storage devices, a system foridentifying files that are missing components, comprising: at least oneserver that, for at least one component of each file, stores on theobject storage device storing the component a single pointer having acount value in an attribute field of the component, wherein the countvalue of the single pointer corresponds to a maximum number ofcomponents for the file, and wherein the single pointer points to afurther component of the file such that a last single component pointsback to a first single component to form a ring, and the last singlecomponent is different from the first single component, and the singlepointer in the last component includes a value of the first component;and wherein, for each file, the at least one server retrieves a list ofcomponents in the file and attempts to retrieve from an attribute fieldof at least one component of the file the count value corresponding tothe maximum number of components of the file; and wherein, for eachfile, the at least one server compares the number of components on thelist to the count value of the single pointer and determines that thefile has the missing components if the file has fewer components thanthe count value.
 4. The system of claim 3, wherein the at least oneserver identifies a file as having at least one missing component if theattempt to retrieve the count value from an attribute field of at leastone component of the file is unsuccessful.